Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
نویسندگان
چکیده
A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from BLASTp sequence alignments, whereas the RSA values are determined directly from the crystal structure. The logistic regression models are fitted using dichotomous responses indicating buried or accessible solvent, with binary classifications obtained from the RSA values. The fitted models determine binary predictions of residue solvent accessibility with accuracies comparable to other less computationally intensive methods using the standard RSA threshold criteria 20 and 25% as solvent accessible. When an additional non-homology descriptor describing Lobanov-Galzitskaya residue disorder propensity is included, incremental improvements in accuracy are achieved with 25% threshold accuracies of 76.12 and 74.79% for the Manesh-215 and CASP(8+9) test sets, respectively. Moreover, the described software and the accompanying learning and validation sets allow students and researchers to explore the utility of RSA prediction with simple, physically intuitive models in any number of related applications.
منابع مشابه
Quantitative Structure - Activity Relationships Study of Carbonic Anhydrase Inhibitors Using Logistic Regression Model
Binary Logistic Regression (BLR) has been developed as non-linear models to establish quantitative structure- activity relationships (QSAR) between structural descriptors and biochemical activity of carbonic anhydrase inhibitors. Using a training set consisted of 21 compounds with known ki values, the model was trained and tested to solve two-class problems as active or inactive on the basi...
متن کاملQSPR models to predict thermodynamic properties of some mono and polycyclic aromatic hydrocarbons (PAHs) using GA-MLR
Quantitative Structure-Property Relationship (QSPR) models for modeling and predicting thermodynamic properties such as the enthalpy of vaporization at standard condition (ΔH˚vap kJ mol-1) and normal temperature of boiling points (T˚bp K) of 57 mono and Polycyclic Aromatic Hydrocarbons (PAHs) have been investigated. The PAHs were randomly separated into 2 groups: training and test sets. A set o...
متن کاملQuantitative Structure-Pproperty Relationship Modeling of the Redox Potential for Some Phenolic Antioxidants
In this work, quantitative structure-property relationship (QSPR) approaches were used to predict the redox potential of 42 phenolic antioxidants. The structures of all compounds optimized by the AM1 semi-empirical method and then a large number of molecular descriptors were calculated for each compound in the data set. Subsequently, stepwise multilinear regression was applied to select the mos...
متن کاملQuantitative Structure-Activity Relationship Study on Thiosemicarbazone Derivatives as Antitubercular agents Using Artificial Neural Network and Multiple Linear Regression
Background and purpose: Nonlinear analysis methods for quantitative structure–activity relationship (QSAR) studies better describe molecular behaviors, than linear analysis. Artificial neural networks are mathematical models and algorithms which imitate the information process and learning of human brain. Some S-alkyl derivatives of thiosemicarbazone are shown to be beneficial in prevention and...
متن کاملQuantitative Structure Activity Relationship Analysis of Coumarins as Free Radical Scavengers by Genetic Function Algorithm
The antioxidant properties of coumarin derivatives using the 2,2ˈ -diphenyl-1- picrylhydrazyl (DPPH) radical scavenging assay were investigated by the application of Quantitative Structure Activity Relationship (QSAR) studies. The molecular structures were optimized and submitted for the generation of quantum chemical and molecular descriptors. Genetic Function Algorithm (GFA) was employed in m...
متن کامل